-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include stats in IPC messages #302
Conversation
vortex-ipc/src/messages.rs
Outdated
children, | ||
}, | ||
) | ||
} | ||
} | ||
|
||
/// Computes all stats and uses the results to create an ArrayStats table for the flatbuffer message | ||
fn compute_and_build_stats<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't let the writer configure which stats are written. Can you think of a clean way to support that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ViewContext is the right place to configure this, added some config there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's right. ViewContext represents the context required to interpret an array view at read-time. The stats configuration is not part of that.
I would actually say this isn't the job of this file. This file can just serialise the array and any stats that it has already computed. The caller (in this case, the IPCWriter) can set/clear whatever stats it likes before serialisation.
vortex-array/src/view.rs
Outdated
Self { encodings, stats } | ||
} | ||
|
||
pub fn set_stats(&mut self, to_enable: &[Stat]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this set_stats method so that we can default to including stats but allow a caller to smash over this with any combination of stats they want.
The default mechanism is not implemented because there is an open question as to what the right set of default encodings is. I think using default stats in the from() method below is reasonable behavior, given that it can be modified afterwards.
vortex-ipc/src/messages.rs
Outdated
_ => None, | ||
}; | ||
|
||
let mut frequencies: HashMap<_, _> = to_compute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This map/collect pattern was introduced to avoid a bunch of if-contains/else, not sure if I like it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird...
You can just create a mut fb::ArrayStatsArgs and then for each stat, get value as scalar, then match the stat and cast if required.
Actually I might have needed the eager merging of chunked stats. |
Ok so the original reason I eagerly merged stats for ChunkedArray was that it didn't work otherwise -- I just assumed that it wasn't implemented. There was an implementation, it was just subtly wrong, which meant that calculating Min would never work. The issue was with fold vs reduce -- fold takes an empty stats array as a base case, and if you merge an empty statsmap with something that has a Min entry, the result has no Min Entry: e.g. the following panics
Because the empty stats map isn't there for any reason other than to give us a base case if there are no stats, replacing the fold with a reduce/unwrap_or_else fixes the problem |
I knew I should have written tests for stats merging |
vortex-ipc/src/messages.rs
Outdated
children, | ||
}, | ||
) | ||
} | ||
} | ||
|
||
/// Computes all stats and uses the results to create an ArrayStats table for the flatbuffer message | ||
fn compute_and_build_stats<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's right. ViewContext represents the context required to interpret an array view at read-time. The stats configuration is not part of that.
I would actually say this isn't the job of this file. This file can just serialise the array and any stats that it has already computed. The caller (in this case, the IPCWriter) can set/clear whatever stats it likes before serialisation.
vortex-ipc/src/messages.rs
Outdated
_ => None, | ||
}; | ||
|
||
let mut frequencies: HashMap<_, _> = to_compute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird...
You can just create a mut fb::ArrayStatsArgs and then for each stat, get value as scalar, then match the stat and cast if required.
I've included a mechanism to configure all of the statistics by default here because the overhead they add to the flatbuffer message is relatively small, given that the arrays themselves are sufficiently large. I considered adding a mechanism to check the length of the arrays here to choose a subset of stats based on the size of the array (probably just drop the two frequency arrays, because they're much larger than everything else), but decided against it for now. I don't think we expect to frequently see arrays small enough that these stats would add a relatively significant amount of wire overhead